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Acoustical correlates of stress can only be evaluated 
in comparison vith some "standard" specifying which syllables are 
actually stressed. The Standard should be consistent from time to 
time, and largely independent of talker and listener idiosyncrasies. 
Three phonetically-trained subjects listened to repeatedly spoken 
texts and spontaneous sentences until they could categorize each 
syllable as either stressed, unstressed, or reduced. This procedure 
vas repeated three times for each speech text and listener. Tvc 
listeners differed from each other on only 5% of all syllables as to 
whether they vere perceived as stressed or not. Each shoved about 5X 
confusion in decisions about stressed syllables from one trial to 
another. Unstressed and reduced levels vere confused more freguently. 
The third listener gave less consistent results. Subjects* judgements 
of stress when given only the vritten text vere cf comparable 
consistency but did vot correspond veil vith perceptions vith speech, 
if the speech was spontaneous rather than spoken texts. Stress 
perceptions consequently may be suitable for evaluating acoustical 
correlates to within a 5% tolerance in overall location scores. 
Pooling the perceptions from several trials and several listeners may 
improve the stability of this "standard" for stress assignment. 
(Author/DD) 
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ABSTRACT 



Acoustical correlates of stress can only be evaliiated in comparison 
-with some "standard" apecifying which syllables are actually stressed. 
The standard should be consistent from time to time, and largely inde- 
pendent of talker and listener idiosyncrasies. Three phonetically- 
trained subjects listened repeatedly to spoken texts and spontaneous 
sentences, until they could categorize each syllable as either 
stressed, imstressed, or reduced. This procedure was repeated three 
times for each speech text and listener. Two listeners differed from 
each other on only 5% of all syllables as to whether they were preceived 
as stresrted or not. Each also showed only about 5% confusions in 
decisions ebout stressed syllables from one trial to another. Unstressed 
and reduced levels were much more frequently confused. The third 
listener gave less consistent results. Subjects^ judgments of stress 
when given only the written text were of comparable consistency, but 
did not correspond we31 with perceptions with speech, if the speech 
was spontaneous rather than spoken texts. Stress perceptions con- 
sequently may be suitable for evaluating acoustical correlates to xdthin 
a 5% tolerance in overall location scores. Pooling the perceptions 
from several trials and several listeners may improve the stability 
of this "standard" for stress assignment. 
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PERCEIVED STRESS AS THE "STANDARD" 
FOR JUDGING ACOUSTICAL CORRELATES OF STRESS 



Wayne A. Lea 

Acoustical cozrrelates of stress can only be evaluated In comparison 
with some "standard" specifying which syllables are actually stressed. 
For studies of Isolated words ^ such as minimal pairs of noun versus 
verb, a desk dictionary or a researcher's own intuitions may be sufficient. 
However, for studies of the stress patterns thraaghout sentences and 
discourses, that "standard" for stress assignment is not as readily 
established. I will report here on some experiments regarding the 
effectiveness and stability of listener's perceptions of stressed, 
tinstressed, and reduced syllables in continuous speech. 

The procedure used in the present study was to have an individual 
repeatedly hear tape recordings, through earphones, and mark, for each 
syllable, whether he heard that syllable as stressed, unstressed, or 
reduced. The listener could listen to portions of the tape as often as 
necessary, until he could mark each syllable. He \ma free to back up 
the tape at his choice^ and no time limits or procedural constraints were 
placed on him. The listeners did endeavor to rewind far enough to 
always hear an entire clause or sentence, to have a constant context 
\d.thin which to judge relative stress levels. 

Slide one illustrates the method for recording a listener's perceptions. 
To facilitate marking for each syllable, the script of each recording 
was typed on a sheet of paper, with vertical slashes between syllables. 
The listener received one such sheet for each recorded text, ai3d a 
mark (such as S, U, or R) was require^ for each syllable. 

Three phonetically-trained listeners were used in this study. An 
earlier study showed that two of these listeners gave similar stress 
perceptions to those of four otheii' listeners used in experiments 
previously reported on by Id, Hughes arid Snow. Each listener repeated 
the perception test at least three times, to determine listener 




li, K*-P., Hughes, G. W., and Snow, T. B. (1973), Segment Clasclflcation in 
Continuous Speeoh, IEEE Trans, on Audio and Electroacoustice . Vol* AU-21, 
No. 1, pp. 50-57. 



consistency from one time to another. The listeners were also asked 
to report their stress judgments vdien given only the written text (with 
no tape recordings}* These Judgments with no speech were also obtained 
In three repetitions, to test their repeatability. 

Speech texts used In this study Included a paragraph of the Rainbow 
Script read by six talkers, a script composed only of monosyllabic words 
read by two talkers, 31 spontaneous senter .es Intended for man-computer 
Interaction, which Involved recordings by ten dJi^forent talkers at 
several contractors within tho ARPA Speech Understanding Research program. 
With the several repetitions by several lls'Jipners, this yielded over 
17,000 judgments of stress levels foi* syllables In connected texts 
spoken by sixteen talkers. 

In the next slide, we see plota of majority votes about the stress 
level for each syllable In several portions of texts. The majority vote 
from a listener's three repetitions of the listening test was first 
found. For example, on two trials he may perceive the work " strikes " as 
stressed, while on the third he hears that syllable as unstressed. His 
majority vote Is then stressed . Then the results for all three listeners 
were pooled, by plotting a stress score as the number of listeners whose 
majority vote says the syllable Is stressed > minus the number whose majority 
vote says the syllable Is re duced . Unstressed judgments were assigned a 
value of zero. Thus, a plus 2 score for syllables like the word "strikes" 
Indicates that two of the listeners heard the syllable as stressed, 
while the third listener perceived that syllable as unstressed. A companion 
study, reported on In another paper at this meeting (Lea, 1973), showed 
that about B5% of the syllables perceived as stressed 'by two or more 
listeners (that Is, those which had a stress score of +2 or +3) were 
correctly found by an algorithm for locating stressed syllables from 
acoustic data. 

Listeners obviously did not ali/ays agree about the stress level of 
a syllable. The next slide shows plotted, for each pair of listeners 
and each text, the percentages of majority stress judgments that differ 
from one listener to another. Listeners MFM and TES disagree about the 
stress levels they assign to about 50% to of all syllables in each 
of the texts. The percentages of listener- to-listener confusions are 
not drastically affected by the talker or text. Even the percentages 
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of confusions with NO speech don't differ much from those vdth speech* 
(I should emphasize at this point that listener TES is quite miusualj 
most pairs of listeners have exhibited more like the 20 to 30% confusions 
between listeners WAL and MFM.) 

In the next slide ^ the confusions between stressed and unstressed 
levels of perception have been separated from the confusions between 
unstressed and reduced syllables, for listeners WAL vs MFM* About 5% 
of all syllables are confused between stressed and unstressed by the 
two listeners WAL vs MFM, as shown by the cross-hatched bars, 
while 15 to 25!^ of all syllables were confused between unstressed and 
reduced categories, as shoT^ by the blank bars* Thus, these two listeners 
agree quite well about which are the stressed syllables, while they do 
not as consistently agree about which are the reduced syllables* 

How a listener's perceptions differ from time to time is shown in 
the next slide* As shown by the cross-hatched bars, listener MFM 
confused about 1 to 556 of the syllables between levels of stressed and 
unstressed from one trial to another* His conftisions between unstressed 
and reduced levels were much more frequent* 

The next slide shows that confusion between the majority 
judgments of a listener with speech and hia majority judgments without 
the speech were more frequent if the speech was spontaneously spoken, 
such as the ARPA sentences were* Particularly for spontaneous speech, 
then, stress locations from acoustical correlates can be joodged more 
reliably from stress perceptions obtained with speech recordings than 
from simple judgments based only on orthographic transcriptions. 

The 31 ARPA Sentences involve declarative sentences, commands, 
questions requiri^ yes/no answers, and questions with interrogative 
WH-x^rds (who, idiere, irtiich, what)* The next slide shows that confusions 
(from trial to trial) were more frequent in questions than in declaratives 
or commands, with yes/no questions yielding the most confusions, and 
declaratives yielding the fewest confusions* 
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We may conclude from theue studies that while the stress perception 
methods used here are generally quite consistent from time to time and 
listener to listener, they will not consistently Judge the effectiveness of 
stressed syllable location from acoustic data to any precision better 
than about 5% tolerance. Then, if a stressed syllable location algorithm 
could located 95% of all syllables perceived as stressed by majority votes 
of two or more listeners, it would be doing as well as one repetition 
of the perception tests would do for predicting the perceptions from a 
second repetition of the experiment* It woiild also be doing as well as 
one listener would do in comparison to another listener. Our standard*^ 
thus has on the order of a 5% tolerance and, when using thin standard, 
we can demand no better precision in stressed sy3J-able location from 
acoustic data. 



ERIC 



4 



O 



•p 

•H 



Q) 



2^3 



•g 



ft 



<0§ 



v5Y 



4^ 



O 



rH 


I/to 






S|« 

on the 



























ERIC 



Slide 1 



CO 

111 
z 
111 

H 
CO 



z 



H Ul Z 

3 3: t « 

a. I- o hi 



Ul J z 

q: m o 



z 



•8 

10 
10 

o 

"8 

U 




< 

CL 

< 



CO 
UJ 

o 
z 
III 



UJ 

CO 



u. 

o 

CO 

z 
o 

p 

CL 
UJ 

o 
a: 

UJ 
CL 



o 

£ 

Q 
UI 



O 
<0 
<0 
O 



1 

"S 




ERIC 



^3 

QC 
O 
O 
CO 

CO 
CO 
UI 
DC 
1- 

co 



z I 5i z ^ >- 
ixSS*<a:2^-io± lis 

S»-«/)-J«A a£Q±l-< H<^<Q. 



„ z o < o 
2 < u. < Of m 













CVi 




10 




<0 










+ 












0 














■81 


cy 








r cn 




1 





4 



3^1 



I/) 
z 
o 
m 



o 
o 

Of 

111 

z 

Ul 

I- 



O 
I- 
I 

bJ 

z 
111 
I- 




1 



I 




O 
P 



C O 



M 

I4 



o o tt r 



Ul 



a. 



i 



I- 

O 



Slide 3 




o 

ERIC 



Slide 4 
8 



